AITopics | perturbation text

Collaborating Authors

perturbation text

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training

Chen, Cong, Liu, Mingyu, Jing, Chenchen, Zhou, Yizhou, Rao, Fengyun, Chen, Hao, Zhang, Bo, Shen, Chunhua

arXiv.org Artificial IntelligenceMar-9-2025

This paper aims to address the challenge of hallucinations in Multimodal Large Language Models (MLLMs) particularly for dense image captioning tasks. To tackle the challenge, we identify the current lack of a metric that finely measures the caption quality in concept level. We hereby introduce HalFscore, a novel metric built upon the language graph and is designed to evaluate both the accuracy and completeness of dense captions at a granular level. Additionally, we identify the root cause of hallucination as the model's over-reliance on its language prior. To address this, we propose PerturboLLaVA, which reduces the model's reliance on the language prior by incorporating adversarially perturbed text during training. This method enhances the model's focus on visual inputs, effectively reducing hallucinations and producing accurate, image-grounded descriptions without incurring additional computational overhead. PerturboLLaVA significantly improves the fidelity of generated captions, outperforming existing approaches in handling multimodal hallucinations and achieving improved performance across general multimodal benchmarks.

conference paper, hallucination, node, (11 more...)

arXiv.org Artificial Intelligence

2503.06486

Country:

Asia > South Korea > Seoul > Seoul (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > China > Zhejiang Province > Ningbo (0.04)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports > Tennis (1.00)
Leisure & Entertainment > Sports > Soccer (1.00)
Transportation (0.70)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Exploring the Multilingual NLG Evaluation Abilities of LLM-Based Evaluators

Chang, Jiayi, Gao, Mingqi, Hu, Xinyu, Wan, Xiaojun

arXiv.org Artificial IntelligenceMar-6-2025

Previous research has shown that LLMs have potential in multilingual NLG evaluation tasks. However, existing research has not fully explored the differences in the evaluation capabilities of LLMs across different languages. To this end, this study provides a comprehensive analysis of the multilingual evaluation performance of 10 recent LLMs, spanning high-resource and low-resource languages through correlation analysis, perturbation attacks, and fine-tuning. We found that 1) excluding the reference answer from the prompt and using large-parameter LLM-based evaluators leads to better performance across various languages; 2) most LLM-based evaluators show a higher correlation with human judgments in high-resource languages than in low-resource languages; 3) in the languages where they are most sensitive to such attacks, they also tend to exhibit the highest correlation with human judgments; and 4) fine-tuning with data from a particular language yields a broadly consistent enhancement in the model's evaluation performance across diverse languages. Our findings highlight the imbalance in LLMs'evaluation capabilities across different languages and suggest that low-resource language scenarios deserve more attention.

original sentence, perturbation text, yingluck shinawatra, (14 more...)

arXiv.org Artificial Intelligence

2503.0436

Country:

Asia > Thailand (0.05)
Asia > Indonesia > Bali (0.04)
Asia > China (0.04)

Genre:

Research Report > Experimental Study (0.48)
Research Report > New Finding (0.34)

Industry: Government (0.32)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Mapping the Mind of an Instruction-based Image Editing using SMILE

Dehghani, Zeinab, Aslansefat, Koorosh, Khan, Adil, Rivera, Adín Ramírez, George, Franky, Khalid, Muhammad

arXiv.org Artificial IntelligenceDec-20-2024

Despite recent advancements in Instruct-based Image Editing models for generating high-quality images, they are known as black boxes and a significant barrier to transparency and user trust. To solve this issue, we introduce SMILE (Statistical Model-agnostic Interpretability with Local Explanations), a novel model-agnostic for localized interpretability that provides a visual heatmap to clarify the textual elements' influence on image-generating models. We applied our method to various Instruction-based Image Editing models like Pix2Pix, Image2Image-turbo and Diffusers-Inpaint and showed how our model can improve interpretability and reliability. Also, we use stability, accuracy, fidelity, and consistency metrics to evaluate our method. These findings indicate the exciting potential of model-agnostic interpretability for reliability and trustworthiness in critical applications such as healthcare and autonomous driving while encouraging additional investigation into the significance of interpretability in enhancing dependable image editing models.

data mining, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.16277

Country: Europe > United Kingdom (0.28)

Genre: Research Report > Promising Solution (0.65)

Industry:

Media > Photography (1.00)
Information Technology (0.88)
Health & Medicine > Diagnostic Medicine (0.67)
Transportation > Ground > Road (0.66)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(5 more...)

Add feedback